Expectation Maximization for Combined Phylogenetic and Hidden Markov Models

نویسنده

  • Adam Siepel
چکیده

An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model. An efficient method is also shown for computing gradients of the log likelihood of a phylogenetic model, which is similar in spirit to the EM algorithms, but has applications beyond EM—for example, in improving the efficiency of the quasi-Newton or conjugategradient methods that are commonly used to fit phylogenetic models. Finally, a small set of experiments are discussed in which the time to fit a phylogenetic model is compared among the EM algorithm, a standard quasi-Newton algorithm, and a modified quasi-Newton algorithm that computes gradients in the new, more efficient way. The performance of all algorithms is compared with that of PAML, a widely used software package for phylogenetic modeling. The modified quasi-Newton and EM algorithms both improve considerably on the standard quasi-Newton algorithm, and are competitive with PAML, despite that little care has been given to performance tuning. The EM algorithm seems particularly promising, despite somewhat slow convergence and a tendency occasionally to become “stuck” for several iterations at suboptimal parameter values. These problems may be addressed by the “entropic update” of Singer and Warmuth, which remains to be implemented and tested.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل سازی فضایی-زمانی وقوع و مقدار بارش زمستانه در گستره ایران با استفاده از مدل مارکف پنهان

Multi site modeling of rainfall is one of the most important issues in environmental sciences especially in watershed management. For this purpose, different statistical models have been developed which involve spatial approaches in simulation and modeling of daily rainfall values. The hidden Markov is one of the multi-site daily rainfall models which in addition to simulation of daily rainfall...

متن کامل

Wavelet-based Image Modelling for Compression Using Hidden Markov Model

Statistical signal modeling using hidden Markov model is one of the techniques used for image compression. Wavelet based statistical signal models are impractical for most of the real time processing because they usually represent the wavelet coefficients as jointly Gaussian or independent to each other. In this paper, we build up an algorithm that succinctly characterizes the interdependencies...

متن کامل

Detecting Sporadic Recombination in DNA Alignments with Hidden Markov Models

Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the...

متن کامل

Modeling Acoustic Correlations by Factor Analysis

Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction . Factor analysis uses...

متن کامل

A probabilistic method to detect regulatory modules

MOTIVATION The discovery of cis-regulatory modules in metazoan genomes is crucial for understanding the connection between genes and organism diversity. RESULTS We develop a computational method that uses Hidden Markov Models and an Expectation Maximization algorithm to detect such modules, given the weight matrices of a set of transcription factors known to work together. Two novel features ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002